Understanding the Data

In my previous project I scraped this website for my data. That whole process took around 14 hours and going through that process on top of downloading all the images would have taken far longer than I wanted. Prior to my last project, while I was doing research on pokemon card data, I found this API service. They have a github repository with all the pokemon card data (including image download links) in JSON format.

Finding that gihub repository was a life saver and, perhaps, cut development time by half. From the data they provided I was able to get all pokemon card data from the Scarlet and Violet series:

Set ID	Set Name	Number of Cards	Release Date
svp	Scarlet & Violet Black Star Promos	165	2023/01/01
sve	Scarlet & Violet Energies	16	2023/03/31
sv1	Scarlet & Violet	258	2023/03/31
sv2	Paldea Evolved	279	2023/06/09
sv3	Obsidian Flames	230	2023/08/11
sv4	Paradox Rift	266	2023/11/03
sv5	Temporal Forces	218	2024/03/22
sv6	Twilight Masquerade	226	2024/05/24
sv7	Stellar Crown	175	2024/09/13
sv8	Surging Sparks	252	2024/11/08
sv9	Journey Together	190	2025/03/28
sv10	Destined Rivals	244	2025/05/30
sv3pt5	151	207	2023/09/22
sv4pt5	Paldean Fates	245	2024/01/26
sv6pt5	Shrouded Fable	99	2024/08/02
sv8pt5	Prismatic Evolutions	180	2025/01/17
zsv10pt5	Black Bolt	172	2025/07/18
rsv10pt5	White Flare	173	2025/07/18

The total number of cards in the Scarlet and Violet series is 3595.

Augmenting the Data

I now have 3595 images which is 3595 classes that I am working with. We need more data for each class so I am going to augment each image 200 times using the python library albumentations and changing the background to images found in the Describable Textures Dataset (DTD) from the University of Oxford.

Lets walk through the code.

We first import all the libraries. The most important ones are cv2 and albumentations.

import asyncio
import os
import albumentations as A
import cv2
import random
import numpy as np
import math

# Self written class to manage async logging
from async_logger_manager import ( get_logger, AsyncLoggerManager )

Setting Up Global Variables

Next we have our global variables. Pay attention to NUM_AUGMENTATIONS_PER_IMAGE, target_size, and image_size. We set NUM_AUGMENTATIONS_PER_IMAGE=200 because we want 200 different data points for each of our class. In total this gives us 719000 images to work with. I will talk more about this below but I realized later that this large dataset was not the most agile dataset to work with.

target_size=512 sets our final image size to 512x512 which is larger than perhaps recommended for a smaller model like the EfficientNetV2-B0 but I will talk about this later too.

image_size=416 sets our pokemon card image’s longest size to 416px this is because I wanted the pokemon card to be fully shown even with the augmentations that will include rotations and perspective changes.

DATA_PATH="./data/"

FILE_INPUT_PATH = os.path.join(DATA_PATH, "images/cards/scarlet_violet")
FILE_OUTPUT_PATH = os.path.join(DATA_PATH, "processed_images/cards/scarlet_violet/")
BACKGROUND_IMAGE_PATH = os.path.join("./data", "backgrounds/dtd/images/")
NUM_AUGMENTATIONS_PER_IMAGE = 200

target_size = 512
image_size = 416

Defining the Transformations

Lets run through the transformations:

Affine is our standard rotations, scaling, and shearing. I wanted rotation to be noticeable but not enough to make the text on the card hard to read so I set it to -15degrees - 15degrees.
Perspective is what it sounds like. I played around with the scale value but left it as default.
RandomBrightnessContrast changes the brightness and the contrast. The default values are good enough.
RandomShadow places random dark shapes with low opacity on your image.
MotionBlur blurs the image. I brought the blur_limit down because it was too much.
GaussNoise puts a filter on the image drawn from gaussian distribution.
ISONoise tries to replicate the camera ISO noise when taking a photo in low light. Way too intense most of the times.
RGBShift changes the color of the image slightly by shifting the values in the Red, Green, and Blue channels.
LongestMaxSize sets the longest size of our image to the value that we set it as. We talked about this earlier.

transform = A.Compose([
  A.Affine(
    rotate=(-15, 15),
    fit_output=True,
    p=0.8
  ),
  
  A.Perspective(
    fit_output=True,
    p=0.7
  ),

  A.RandomBrightnessContrast(p=0.7),
  A.RandomShadow(p=0.4),
  A.RandomSunFlare(p=0.2),
  A.MotionBlur(blur_limit=(3, 5), p=0.3),
  A.GaussNoise(p=0.3),
  A.ISONoise(intensity=(0.05, 0.25), color_shift=(0.01, 0.03), p=0.3),
  A.RGBShift(r_shift_limit=15, g_shift_limit=15, b_shift_limit=15, p=0.3),

  A.LongestMaxSize(max_size=target_size),
])

The Image Processing Pipeline

Let focus in on our main function by breaking it down into a few steps:

First, I get original_images (list of all images in the input directory) and background_files (all the directories in the background input directory)
Then we go into the main loop.

(2.a) Here we read the image using cv2 and the IMREAD_UNCHANGED flag to include the alpha channel.
(2.b) We then convert the image from BGRA to RGBA. This is because opencv reads BGRA but albumentations reads RGBA.
(2.c) Because keras is able to get the classes from the directory, I seperate out the base_name to be used as the output path. The set_code was originally used to find the pokemon card’s specific set and their JSON data but I didn’t need it so I edited it out.

The next loop is the actual image processing loop.

(3.a) First I find a random background image from the DTD dataset to be used as the background of the pokemon card.
(3.b) Then I seperate out the RBGA channels into RGB and A channels (basically seperating out the colors from the transparency layer) and apply the transformation to both. This way the mask has the same rotation, zoom, etc. as the actual image. It is important to note that albumentations keeps these channels seperate which leads us to my next step.
(3.c) We seperate the image and the mask from the object that albumentations gave us and join them back together to clean the transparent areas. aug_alpha_mask == 0 finds the pixels where alpha=0 and we set those pixels to transparent and black ([0,0,0,0]). Then we extract the newly cleaned and processed mask.
(3.d) The next part was the hardest part of the whole code. I tried multiple iterations of this and often times either the mask was broken or nothing appeared in the background. Basically, I want to now take the augmented image and place it onto my background.
- (3.d.1) The images included in the DTD dataset were all of different sizes and, since I wanted my image size to be 512x512 I had to resize the DTD background image so I can crop it down to the 512x512 size.
- (3.d.2) Then I find a random place on the background image to crop it down to a 512x512 image from the resized background image.
- (3.d.3) Then I find a random place on the background image to place the image into and find the inverse of the mask that we created in step (3.c) and that will be where the background image would be. Then I find the color portions of the augmented card.
- (3.d.4) I extract the region that the card will be placed from the background, find the background pixels of the image that will be where the card is transparent using the mask_inv in our last step, and keep the card pixels using the mask that we created.
- (3.d.5) The final step is to combine all our previous steps. So we created the background, the background part of the card (where the card is transparent), and the card itself. We first combine the card and the background of the card and then combine that with the entire background.
- (3.d.6) Then we save our image.

async def main() -> None:
  logger_manager = AsyncLoggerManager.instance()
  logger_manager.init(log_file="scraping.log")
  logger = logger_manager.get_logger("scrapingImages")

  os.makedirs(FILE_OUTPUT_PATH, exist_ok=True)

  # (1) Get list of images and background directories
  original_images = [img for img in os.listdir(FILE_INPUT_PATH) if img.endswith(('.png', '.jpg', '.jpeg'))]
  background_files = [os.path.join(BACKGROUND_IMAGE_PATH, f) for f in os.listdir(BACKGROUND_IMAGE_PATH)]

  logger.info("Starting image processing...")

  # (2) Main loop
  for image_name in original_images:

    # (2.a) Read image with alpha channel
    image_path = os.path.join(FILE_INPUT_PATH, image_name)
    image = cv2.imread(image_path, cv2.IMREAD_UNCHANGED)

    # (2.b) Convert from BGRA to RGBA
    image = cv2.cvtColor(image, cv2.COLOR_BGRA2RGBA)
    if image is None:
      logger.warning(f"Failed to read image: {image_path}")
      continue

    # (2.c) Extract base name and set code
    base_name, ext = os.path.splitext(image_name)
    set_code = base_name.split('-')[0]

    logger.info(f"Base name: {base_name}")
    logger.info(f"Set code: {set_code}")

    output_image_path = os.path.join(FILE_OUTPUT_PATH, base_name)
    os.makedirs(output_image_path, exist_ok=True)

    # (3) Image procecessing loop
    for i in range(NUM_AUGMENTATIONS_PER_IMAGE):

      # (3.a) Select random background image
      bg_set_path = random.choice(background_files)
      background_image_files = [f for f in os.listdir(bg_set_path) if f.endswith(('.png', '.jpg', '.jpeg'))]
      bg_image_path = os.path.join(bg_set_path, random.choice(background_image_files))

      background = cv2.imread(bg_image_path)
      if background is None:
        logger.warning(f"Failed to read background image: {bg_image_path}")
        continue
      background = cv2.cvtColor(background, cv2.COLOR_BGR2RGB)

      # (3.b) Separate RGBA channels and apply transformations
      rgb_card = image[:, :, :3]
      alpha_mask = image[:, :, 3]

      augmented_data = transform(image=rgb_card, mask=alpha_mask)

      # (3.c) Recombine and clean transparent areas
      aug_card_rgb = augmented_data['image']
      aug_alpha_mask = augmented_data['mask']

      augmented_image = np.dstack((aug_card_rgb, aug_alpha_mask))
      background_pixels = aug_alpha_mask == 0
      augmented_image[background_pixels] = [0, 0, 0, 0]

      mask = augmented_image[:, :, 3]

      # (3.d) Place augmented image onto background
      # (3.d.1) Resize background to ensure it is large enough for cropping
      bg_h, bg_w, _ = background.shape
      scale_factor = max(target_size / bg_h, target_size / bg_w)
      new_w = math.ceil(bg_w * scale_factor)
      new_h = math.ceil(bg_h * scale_factor)
      
      bg_resized = cv2.resize(background, (new_w, new_h))

      # This and the if statement below was added before I realized I should resize the width and height
      # using math.ceil to ensure they are always at least target_size of 512x512
      h, w, _ = bg_resized.shape

      if h < target_size or w < target_size:
        logger.warning(
          f"Resized background is too small ({w}x{h}) to crop. Skipping this one."
        )
        continue
      
      # (3.d.2) Randomly crop a 512x512 section from the resized background
      y_start = random.randint(0, h - target_size)
      x_start = random.randint(0, w - target_size)
      bg_crop = bg_resized[y_start:y_start+target_size, x_start:x_start+target_size]

      # (3.d.3) Randomly position the augmented image on the cropped background
      card_h, card_w, _ = augmented_image.shape
      y_offset = random.randint(0, max(0, target_size - card_h))
      x_offset = random.randint(0, max(0, target_size - card_w))

      mask_inv = cv2.bitwise_not(mask)

      aug_card_rgb = augmented_image[:, :, 0:3]

      # final_image = np.zeros((target_size, target_size, 3), dtype=np.uint8)

      # (3.d.4) Seperate out the region from the background and the transparent areas of the card
      roi = bg_crop[y_offset:y_offset+card_h, x_offset:x_offset+card_w]

      bg_part = cv2.bitwise_and(roi, roi, mask=mask_inv)
      card_part = cv2.bitwise_and(aug_card_rgb, aug_card_rgb, mask=mask)

      # (3.d.5) Combine the two parts and place it back onto the background crop
      combined_roi = cv2.add(bg_part, card_part)

      bg_crop[y_offset:y_offset+card_h, x_offset:x_offset+card_w] = combined_roi

      # (3.e) Save the final image
      base_name, extension = os.path.splitext(image_name)
      new_image_name = f"{base_name}_aug_{i+1}.jpg"
      output_path = os.path.join(output_image_path, new_image_name)

      cv2.imwrite(output_path, cv2.cvtColor(bg_crop, cv2.COLOR_RGB2BGR))


  logger.info("Image processing completed")
  logger_manager.stop()


if __name__ == "__main__":
  import pathlib
  import sys

  assert sys.version_info >= (3, 10), "Script requires Python 3.10+."
  here = pathlib.Path(__file__).parent

  asyncio.run(main())

Sample Images

These images are not the ones I will be using to train my model. I originally was going to use the first few sets to train my model but I do not have enough personal cards from that time period to be able to test the model properly.

base1-1_aug_8

base1-2_aug_9

base1-3_aug_7

Here are some examples of images I actually used in the model:

sv6-12_aug_198

rsv10pt5-152_aug_10

sv2-56_aug_103

In total, from my 3595 classes, I generated 719,000 images taking up about 110 GB of storage.